Script Extractor 1.3
by Griffin Knodle, a.k.a. Jair, 10/30/98
E-mail:gknodle@trinity.edu
http://fly.to/vale


---------------------------------------------------------------------------
WHAT IT IS
---------------------------------------------------------------------------
Script Extractor will take text from a ROM and put it into an ordinary text file for easy editing.  It supports Thingy-format table files, but has some features of its own too.  It does not support dual tile encoding or text compression, but future versions may.

---------------------------------------------------------------------------
New in 1.3
---------------------------------------------------------------------------
- Adds support for two-byte break values.

---------------------------------------------------------------------------
New in 1.2
---------------------------------------------------------------------------
- Supports the new table features added by Script Inserter.
- Fixes a bug that prevented SE from extracting the last byte of a file.
- Fixes a bug that added a return to the end of the output file.


---------------------------------------------------------------------------
WHAT YOU'LL NEED
---------------------------------------------------------------------------
A ROM to take text from
A table file for converting ROM text into ordinary text


---------------------------------------------------------------------------
WHAT TO DO
---------------------------------------------------------------------------
1. Basic program use
2. Table files
     a. Standard conversions -- "xx=whatever", "xxxx=whatever"
     b. Line, section, and message breaks -- "m: xx" or "/xx"
     c. Commands for Script Extractor
          1. Treatment of unconvertable bytes -- "?0", "?1", "?2"
          2. Style for converting break characters -- "?break: x"
          3. Length limits -- "?line: x", "?sect: x", "?msg: x"
          4. Pointer table location -- "?ptr table: xxxxxxxx"
          5. Overwrite / append to output file -- "?a"
     d. Displaying program control characters -- "prog cont char: xx=label"
     e. Index
3. A final word

---------------------------------------------------------------------------
1.  Basic program use
---------------------------------------------------------------------------
Run the program.  It will ask you what your table file is called.  (If you're a lazy typer, you don't have to type the ".tbl" extension -- it'll put it on for you.)  It will read the table file and warn you if it couldn't understand some of the entries.  Then it will ask you what ROM you're reading from and what text file you want to output to.  (If you just hit return, it'll assume that the files have the same name -- that is, if your table is "zelda2.tbl", it will assume you're using "zelda2.nes" and "zelda2.txt".  I'm a lazy typer.)

Next it will ask for the starting address of your script dump.  (If you don't know how to find text in a ROM, go to my site and work through my Relative Searcher Tutorial.)  Finally, it will ask how many bytes you want to read from the ROM.  Then it will extract the text (if the text file already exists, it will add on to the end) and print out a few statistics, namely how many characters it converted and how many sections and messages the script is broken into.

---------------------------------------------------------------------------
2.  Table files
---------------------------------------------------------------------------
Script Extractor needs a table file to tell it how to convert each byte from the ROM into a character you can read and edit.  Basically, a table file looks like this:

58=A
59=B
5A=C
5B=D
A0=.
A1=!

Note that all values are given in hexadecimal (base 16).  This is how they'll show up if you're looking through the ROM in a hex editor such as Hex Workshop or Thingy.

---------------------------------------------------------------------------
2.a.  Standard conversions
---------------------------------------------------------------------------
The example above tells Script Extractor that a value of 58 hex represents "A", 59 represents "B", A1 represents "!", and so on.  Just put the value to convert from on the left, an equals sign, and what you want to convert that value to on the right.  You can have more than one letter on the right side.  You can also convert two bytes at once.  For example, these are all perfectly good entries:

AA=a
AB=b
C3=he
0400=Cecil

Another word about two-byte conversions: You can have two-byte conversions  and program control characters (see 2.d.) starting with the same value.  The two-byte conversions will take precedence.  For example, in Radia, B3 is a character that marks a compressed string.  B300 is the player's name.  I have these two lines in my table:

program control char: b3=cmp.
b300=[Player]

Instead of converting B300 as "<cmp. 00>", SE converts it as "[Player]".

---------------------------------------------------------------------------
2.b.  Line, section, and message breaks
---------------------------------------------------------------------------
Most ROMs have certain values that represent line breaks, section breaks, or message breaks.  (Not all ROMs use all three kinds of breaks.)  A line break just drops down to the next line.  A section break waits for the player to push a button before continuing with the next section.  A message break stops displaying text and returns to the main game.

Hi, here comes a line [line break]
break. Soon we'll have [line break]
a section break. [section break]
Now I'm talking again. [line break]
But now I'm about to stop. [message break]
Hi!  This is a different [line break]
message, at a totally [line break]
different point in the game! [message break]

I hope that clears it up.  Anyhow, you represent these with "l: xx", "s: xx", and "m: xx", respectively.  If your ROM's line breaks are F0 and its message breaks are F1, your table file would include these two lines:

l: F0
m: F1

(The required elements are the first letter, the colon, and the value.  These two lines work the same as the above example:)

line break: F0
Message breaks are so cool! :F1

Oh, and some games, like Dragon Warrior 1, seem to use two bytes for breaks.  SE 1.3 supports this.

sect: FBFD

If your table files are for Necrosaro's Thingy, don't worry.  Script Extractor can also use Thingy's format (line breaks are "*xx", message breaks are "/xx").

---------------------------------------------------------------------------
2.c.  Commands for Script Extractor
---------------------------------------------------------------------------
Tables can have lines that don't convert any text, but instead give instructions to Script Extractor.  These command lines all begin with a question mark.

---------------------------------------------------------------------------
2.c.1.  Treatment of unconvertable bytes
---------------------------------------------------------------------------
Script Extractor needs to know what to do if it runs into a value that it doesn't know how to convert.  By default, it will print out the value in hex in the middle of your script dump:

Here comes an<D8>unknown byte.

If you'd like a cleaner dump, you can set it to ignore unconvertable bytes:

Here comes anunknown byte.

Or you can set it to output the byte as an ordinary ASCII character, which will often get messy:

Here comes anunknown byte.

You set it with a line that's just a question mark and a number (0=ignore, 1=output as <xx> (the default setting), 2=ASCII characters).  For example, if you want Script Extractor to ignore unconvertable bytes, just put this line somewhere in your table file:

?0

---------------------------------------------------------------------------
2.c.2.  Style for converting break characters
---------------------------------------------------------------------------
This is here to make your scripts more pleasant to read.  By default, SE will convert line breaks to returns, section breaks to double returns with a backslash, and message breaks to double returns.  (This is completely different than the way 1.0 converted breaks.)  For example:

Here's a line break.
Now a section break.
\
And a message break.

End of example.

There are two other settings for these conversions.  You set them with "?b:1" or "?b:2".  ("B" is for "break conversion style.")  Alternate 1 is like the default, but with spaces instead of returns for line breaks:

Here's a line break. Now a section break.
\
And a message break.

End of example.

Alternate 2 is the way SE 1.0 converted break characters:

Here's a line break. Now a section break.\And a message break.
End of example.

---------------------------------------------------------------------------
2.c.3.  Length limits
---------------------------------------------------------------------------
Used by Script Inserter.

---------------------------------------------------------------------------
2.c.4.  Pointer table address
---------------------------------------------------------------------------
Used by Script Inserter.

---------------------------------------------------------------------------
2.c.5.  Overwrite / append to output file
---------------------------------------------------------------------------
By default, SE overwrites your output file every time you run it.  If you want to dump several different blocks of text into one file (which I don't recommend, 'cause it'll cause problems when you try to re-insert it) or something, you can set SE to append to the end of the file instead.  Just put this line in your table file:

?a

---------------------------------------------------------------------------
2.d.  Displaying program-control characters
---------------------------------------------------------------------------
This is tricky to explain, so don't worry if you don't get it.  Some games use certain values to trigger special actions in the middle of displaying text messages.  These come two bytes at a time: First comes the trigger byte, then comes the data byte.  For example, in Radia, B6 makes the game change the speed it's writing the text to the screen.  In this example, the text will slow way down when it displays the three dots.

We're about to go <B6><3F>...<B6><04> slow.

Because these directions come two bytes at a time, and the second byte is used for data instead of as a displaying character, these directions would  come out funny in a script dump.  For example, if 3F and 04 represent letters, your dump might come out something like this:

We're about to go <B6>n...<B6>D slow.

Script Extractor lets you tell it to dump two-byte commands like these.  The command is "p: xx=label" (p is for program control character).  For example, my table file for Radia includes this line:

p: B6=Speed

The required elements are the "p", the colon, the value, the equals sign, and the label.  This line works just as well:

Prog. control char : B6 =Speed

With such a line in my table file, my script dump comes out like this:

We're about to go <Speed 3F>...<Speed 04> slow.

---------------------------------------------------------------------------
2.e.  Index
---------------------------------------------------------------------------
?0                        Treatment of unconvertable bytes   2.c.1
?1                        Treatment of unconvertable bytes   2.c.1
?2                        Treatment of unconvertable bytes   2.c.1
?append                   Append to output file              2.c.5
?break: x                 Style of output                    2.c.2
?line: x                  Sets maximum length for lines      2.c.3
?msg: x                   Sets maximum length for messages   2.c.3
?pointer table: xxxxxxxx  Gives pointer table's address      2.c.4
?sect: x                  Sets maximum length for sections   2.c.3
*xx                       Line break                         2.b
/xx                       Message break                      2.b
line: xx                  Line break                         2.b
msg: xx                   Message break                      2.b
prog cont char: xx=...    Program control character          2.d
sect: xx                  Section break                      2.b
xx=...                    Normal conversion                  2.a
xxxx=...                  Normal conversion, two bytes       2.a

---------------------------------------------------------------------------
3.  A final word
---------------------------------------------------------------------------
This probably seems pretty overwhelming.  There are example table files on my site you can download, look at, and use.  Keep an eye out for my detailed beginners' tutorial, coming sometime in the near future.  And as always, feel free to e-mail me with any questions, comments, or suggestions.


---------------------------------------------------------------------------
TERMS OF USE
---------------------------------------------------------------------------
This program is distributed with its source code.  You may use, distribute, and modify it freely.  Only two restrictions: These terms of use must stay the same, and you must always include the source code with the program.  Oh, and I'd appreciate it if you credit me as the original author.


---------------------------------------------------------------------------
DISCLAIMER
---------------------------------------------------------------------------
All games and systems mentioned are copyright their respective companies.  I am not responsible for any damage you may cause to your computer or software by using this program.  Owning a ROM is illegal unless you already own the cartridge.  If there are runners on first and second and fewer than two outs, a fly ball hit to an infielder shall be ruled "caught" even if the infielder drops it.  I think that about covers everything.

